# Zero-shot Image Captioning
Mblip Bloomz 7b
MIT
mBLIP is a multilingual vision-language model based on the BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.
Image-to-Text
Transformers Supports Multiple Languages

M
Gregor
21
1
Mblip Mt0 Xl
MIT
mBLIP is a multilingual vision-language model based on BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.
Image-to-Text
Transformers Supports Multiple Languages

M
Gregor
374
14
Featured Recommended AI Models